GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB —Extended Abstract—∗
نویسندگان
چکیده
We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our tracking method combines a convolutional neural network with a kinematic 3D hand model, such that it generalizes well to unseen data, is robust to occlusions and varying camera viewpoints, and leads to anatomically plausible as well as temporally smooth hand motions. For enhancing the training data for our CNN we propose a geometrically consistent image-to-image translation network. To be more specific, we use a neural network that translates synthetic images to “real” images, such that the sogenerated images follow the same statistical distribution as real-world hand images while preserving geometric properties (such as hand pose). We demonstrate that our hand tracking system outperforms the current state-of-the-art on challenging RGB-only footage.
منابع مشابه
GANerated Hands for Real-time 3D Hand Tracking from Monocular RGB
We address the highly challenging problem of real-time 3D hand tracking based on a monocular RGB-only sequence. Our tracking method combines a convolutional neural network with a kinematic 3D hand model, such that it generalizes well to unseen data, is robust to occlusions and varying camera viewpoints, and leads to anatomically plausible as well as temporally smooth hand motions. For training ...
متن کاملUsing a single RGB frame for real time 3D hand pose estimation in the wild
We present a method for the real-time estimation of the full 3D pose of one or more human hands using a single commodity RGB camera. Recent work in the area has displayed impressive progress using RGBD input. However, since the introduction of RGBD sensors, there has been little progress for the case of monocular color input. We capitalize on the latest advancements of deep learning, combining ...
متن کاملGANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB –Supplementary Document–
Network Design: The architecture of GeoConGAN is based on the CycleGAN [13], i.e. we train two conditional generator and two discriminator networks for synthetic and real images, respectively. Recently, also methods using only one generator and discriminator for enrichment of synthetic images from unpaired data have been proposed. Shrivastava et al. [9] and Liu et al. [5] both employ an L1 loss...
متن کامل3D Hand Pose Detection in Egocentric RGB-D Images
We focus on the task of everyday hand pose estimation from egocentric viewpoints. For this task, we show that depth sensors are particularly informative for extracting near-field interactions of the camera wearer with his/her environment. Despite the recent advances in full-body pose estimation using Kinect-like sensors, reliable monocular hand pose estimation in RGB-D images is still an unsolv...
متن کاملReal-Time Joint Tracking of a Hand Manipulating an Object from RGB-D Input
Real-time simultaneous tracking of hands manipulating and interacting with external objects has many potential applications in augmented reality, tangible computing, and wearable computing. However, due to difficult occlusions, fast motions, and uniform hand appearance, jointly tracking hand and object pose is more challenging than tracking either of the two separately. Many previous approaches...
متن کامل